Automatic Discovery of Linguistic Patterns for Information Extraction

نویسندگان

  • Sanda M. Harabagiu
  • Mihai Surdeanu
  • Paul Morarescu
چکیده

Information Extraction (IE) systems typically rely on extraction patterns encoding domain-specific knowledge. When matched against natural language texts, these patterns recognize with high accuracy information relevant to the extraction task. Adapting an IE system to a new extraction scenario entails devising a new collection of extraction patterns a time-consuming and expensive process. To overcome this obstacle, we have implemented in CICERO, our IE system, a pattern acquisition mechanism that combines lexicosemantic knowledge available from WordNet with syntactic information collected from training corpora. The open-domain nature of the knowledge encoded in WordNet grants portability of our approach across multiple extraction domains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Method for Extracting Causal Knowledge from Textual Databases

This paper describes the first phase of a project to develop a knowledge extraction and knowledge discovery system that extracts causal knowledge from a textual database automatically, and attempts to infer new causal relationships from the extracted information. The initial work is focused on developing an automatic method for identifying and extracting cause-effect information expressed in me...

متن کامل

Combinaison d'approches pour l'extraction automatique d'événements (Automatic events extraction by combining multiple approaches) [in French]

Automatic events extraction by combining multiple approaches In this paper, we present an automatic system for extracting events based on the combination of two existing information extraction approaches : the first one is made of hand-crafted linguistic rules and the second one is based on an automatic learning of linguistic patterns. We have shown that this mixed approach leads to a significa...

متن کامل

Acquisition of Linguistic Patterns for Knowledge-based Information Extraction

In this paper we present a new method of automatic acquisition of linguistic patterns for Information Extraction, as implemented in the CICERO system. Our approach combines lexico-semantic information available from the WordNet database with collocating data extracted from training corpora. Due to the open-domain nature of the WordNet information and the immediate availability of large collecti...

متن کامل

A Domain-Independent Approach to IE Rule Development

A key element for the extraction of information in a natural language document is a set of shallow text analysis rules, which are typically based on pre-defined linguistic patterns. Current Information Extraction research aims at the automatic or semi-automatic acquisition of these rules. Within this research framework, we consider in this paper the potential for acquiring generic extraction pa...

متن کامل

Lexical Patterns or Dependency Patterns: Which Is Better for Hypernym Extraction?

We compare two different types of extraction patterns for automatically deriving semantic information from text: lexical patterns, built from words and word class information, and dependency patterns with syntactic information obtained from a full parser. We are particularly interested in whether the richer linguistic information provided by a parser allows for a better performance of subsequen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001